Towards Minimax Online Learning with Unknown Time Horizon

نویسندگان

  • Haipeng Luo
  • Robert E. Schapire
چکیده

We consider online learning when the time horizon is unknown. We apply a minimax analysis, beginning with the fixed horizon case, and then moving on to two unknown-horizon settings, one that assumes the horizon is chosen randomly according to some known distribution, and the other which allows the adversary full control over the horizon. For the random horizon setting with restricted losses, we derive a fully optimal minimax algorithm. And for the adversarial horizon setting, we prove a nontrivial lower bound which shows that the adversary obtains strictly more power than when the horizon is fixed and known. Based on the minimax solution of the random horizon setting, we then propose a new adaptive algorithm which “pretends” that the horizon is drawn from a distribution from a special family, but no matter how the actual horizon is chosen, the worst-case regret is of the optimal rate. Furthermore, our algorithm can be combined and applied in many ways, for instance, to online convex optimization, follow the perturbed leader, exponential weights algorithm and first order bounds. Experiments show that our algorithm outperforms many other existing algorithms in an online linear optimization setting.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards Minimax Online Learning with Unknown Time Horizon

A. Proof of Theorem 1 We first state a few properties of the function R: Proposition 1. For any vector M of N dimensions and integer r, Property 1. R(M, r) = a+R((M1 a, . . . ,MN a), r) for any real number a and r 0. Property 2. R(M, r) is non-decreasing in Mi for each i = 1, . . . , N . Property 3. If r > 0, R(M, r) R(M, r 1)  1/N . Property 4. If r > 0, and Pi = 1 N +R(M+ ei, r 1) R(M, r) fo...

متن کامل

Stochastic Primal-Dual Methods and Sample Complexity of Reinforcement Learning

We study the online estimation of the optimal policy of a Markov decision process (MDP). We propose a class of Stochastic Primal-Dual (SPD) methods which exploit the inherent minimax duality of Bellman equations. The SPD methods update a few coordinates of the value and policy estimates as a new state transition is observed. These methods use small storage and has low computational complexity p...

متن کامل

Minimax Observers for Linear DAEs

In this note we construct finite and infinite horizon minimax observers for a linear stationary DAE with deterministic, unknown, but bounded noise. By using generalized Kalman duality and geometric control we prove that the finite (infinite) horizon observer exists if and only if the DAE is observable (detectable). Remarkably, the regularity for the DAE is not required.

متن کامل

Achievability of asymptotic minimax regret by horizon-dependent and horizon-independent strategies

The normalized maximum likelihood distribution achieves minimax coding (log-loss) regret given a fixed sample size, or horizon, n. It generally requires that n be known in advance. Furthermore, extracting the sequential predictions from the normalized maximum likelihood distribution is computationally infeasible for most statistical models. Several computationally feasible alternative strategie...

متن کامل

Wedgelets: Nearly-Minimax Estimation of Edges

We study a simple “Horizon Model” for the problem of recovering an image from noisy data; in this model the image has an edge with α-Hölder regularity. Adopting the viewpoint of computational harmonic analysis, we develop an overcomplete collection of atoms called wedgelets, dyadically organized indicator functions with a variety of locations, scales, and orientations. The wedgelet representati...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014